{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "bccd47fc",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/vector_stores/Lantern.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "db0855d0",
   "metadata": {},
   "source": [
    "# Lantern Vector Store\n",
    "In this notebook we are going to show how to use [Postgresql](https://www.postgresql.org) and  [Lantern](https://github.com/lanterndata/lantern) to perform vector searches in LlamaIndex"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "e4f33fc9",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "712daea5",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "!pip install psycopg2-binary llama-index asyncpg \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c2d1c538",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index import SimpleDirectoryReader, StorageContext\n",
    "from llama_index.indices.vector_store import VectorStoreIndex\n",
    "from llama_index.vector_stores import LanternVectorStore\n",
    "import textwrap\n",
    "import openai"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26c71b6d",
   "metadata": {},
   "source": [
    "### Setup OpenAI\n",
    "The first step is to configure the openai key. It will be used to created embeddings for the documents loaded into the index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "67b86621",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"<your_key>\"\n",
    "openai.api_key = \"<your_key>\""
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "eecf4bd5",
   "metadata": {},
   "source": [
    "Download Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6df9fa89",
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p 'data/paul_graham/'\n",
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "f7010b1d-d1bb-4f08-9309-a328bb4ea396",
   "metadata": {},
   "source": [
    "### Loading documents\n",
    "Load the documents stored in the `data/paul_graham/` using the SimpleDirectoryReader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c154dd4b",
   "metadata": {},
   "outputs": [],
   "source": [
    "documents = SimpleDirectoryReader(\"./data/paul_graham\").load_data()\n",
    "print(\"Document ID:\", documents[0].doc_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7bd24f0a",
   "metadata": {},
   "source": [
    "### Create the Database\n",
    "Using an existing postgres running at localhost, create the database we'll be using."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e6d61e73",
   "metadata": {},
   "outputs": [],
   "source": [
    "import psycopg2\n",
    "\n",
    "connection_string = \"postgresql://postgres:postgres@localhost:5432\"\n",
    "db_name = \"postgres\"\n",
    "conn = psycopg2.connect(connection_string)\n",
    "conn.autocommit = True\n",
    "\n",
    "with conn.cursor() as c:\n",
    "    c.execute(f\"DROP DATABASE IF EXISTS {db_name}\")\n",
    "    c.execute(f\"CREATE DATABASE {db_name}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8883b6b0-8a1e-42ca-9134-ade42285e7dc",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.embeddings import OpenAIEmbedding\n",
    "from llama_index import ServiceContext\n",
    "\n",
    "# Setup global service context with embedding model\n",
    "# So query strings will be transformed to embeddings and HNSW index will be used\n",
    "embed_model = OpenAIEmbedding()\n",
    "service_context = ServiceContext.from_defaults(embed_model=embed_model)\n",
    "\n",
    "from llama_index import set_global_service_context\n",
    "\n",
    "set_global_service_context(service_context)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c0232fd1",
   "metadata": {},
   "source": [
    "### Create the index\n",
    "Here we create an index backed by Postgres using the documents loaded previously. LanternVectorStore takes a few arguments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8731da62",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sqlalchemy import make_url\n",
    "\n",
    "url = make_url(connection_string)\n",
    "vector_store = LanternVectorStore.from_params(\n",
    "    database=db_name,\n",
    "    host=url.host,\n",
    "    password=url.password,\n",
    "    port=url.port,\n",
    "    user=url.username,\n",
    "    table_name=\"paul_graham_essay\",\n",
    "    embed_dim=1536,  # openai embedding dimension\n",
    ")\n",
    "\n",
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    documents, storage_context=storage_context, show_progress=True\n",
    ")\n",
    "query_engine = index.as_query_engine()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8ee4473a-094f-4d0a-a825-e1213db07240",
   "metadata": {},
   "source": [
    "### Query the index\n",
    "We can now ask questions using our index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0a2bcc07",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = query_engine.query(\"What did the author do?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8cf55bf7",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "68cbd239-880e-41a3-98d8-dbb3fab55431",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = query_engine.query(\"What happened in the mid 1980s?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fdf5287f",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b3bed9e1",
   "metadata": {},
   "source": [
    "### Querying existing index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e6b2634b",
   "metadata": {},
   "outputs": [],
   "source": [
    "vector_store = LanternVectorStore.from_params(\n",
    "    database=db_name,\n",
    "    host=url.host,\n",
    "    password=url.password,\n",
    "    port=url.port,\n",
    "    user=url.username,\n",
    "    table_name=\"paul_graham_essay\",\n",
    "    embed_dim=1536,  # openai embedding dimension\n",
    "    m=16,  # HNSW M parameter\n",
    "    ef_construction=128,  # HNSW ef construction parameter\n",
    "    ef=64,  # HNSW ef search parameter\n",
    ")\n",
    "\n",
    "# Read more about HNSW parameters here: https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md\n",
    "\n",
    "index = VectorStoreIndex.from_vector_store(vector_store=vector_store)\n",
    "query_engine = index.as_query_engine()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e7075af3-156e-4bde-8f76-6d9dee86861f",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = query_engine.query(\"What did the author do?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b088c090",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "55745895-8f01-4275-abaa-b2ebef2cb4c7",
   "metadata": {},
   "source": [
    "### Hybrid Search  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "91cae40f-3cd4-4403-8af4-aca2705e96a2",
   "metadata": {},
   "source": [
    "To enable hybrid search, you need to:\n",
    "1. pass in `hybrid_search=True` when constructing the `LanternVectorStore` (and optionally configure `text_search_config` with the desired language)\n",
    "2. pass in `vector_store_query_mode=\"hybrid\"` when constructing the query engine (this config is passed to the retriever under the hood). You can also optionally set the `sparse_top_k` to configure how many results we should obtain from sparse text search (default is using the same value as `similarity_top_k`). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "65a7e133-39da-40c5-b2c5-7af2c0a3a792",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sqlalchemy import make_url\n",
    "\n",
    "url = make_url(connection_string)\n",
    "hybrid_vector_store = LanternVectorStore.from_params(\n",
    "    database=db_name,\n",
    "    host=url.host,\n",
    "    password=url.password,\n",
    "    port=url.port,\n",
    "    user=url.username,\n",
    "    table_name=\"paul_graham_essay_hybrid_search\",\n",
    "    embed_dim=1536,  # openai embedding dimension\n",
    "    hybrid_search=True,\n",
    "    text_search_config=\"english\",\n",
    ")\n",
    "\n",
    "storage_context = StorageContext.from_defaults(\n",
    "    vector_store=hybrid_vector_store\n",
    ")\n",
    "hybrid_index = VectorStoreIndex.from_documents(\n",
    "    documents, storage_context=storage_context\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6f8edee4-6c19-4d99-b602-110bdc5708e5",
   "metadata": {},
   "outputs": [],
   "source": [
    "hybrid_query_engine = hybrid_index.as_query_engine(\n",
    "    vector_store_query_mode=\"hybrid\", sparse_top_k=2\n",
    ")\n",
    "hybrid_response = hybrid_query_engine.query(\n",
    "    \"Who does Paul Graham think of with the word schtick\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bd454b25-b66c-4733-8ff4-24fb2ee84cec",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(hybrid_response)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
